Managing the Deluge of Scientific Data

نویسندگان

  • Satya S. Sahoo
  • Amit Sheth
چکیده

Provenance information in eScience is metadata that's critical to effectively manage the exponentially increasing volumes of scientific data from industrial-scale experiment protocols. Semantic provenance, based on domain-specific provenance ontologies, lets software applications unambiguously interpret data in the correct context. The semantic provenance framework for eScience data comprises expressive provenance information and domain-specific provenance ontologies and applies this information to data management. The authors' " two degrees of separation " approach advocates the creation of high-quality provenance information using specialized services. In contrast to workflow engines generating provenance information as a core functionality, the specialized provenance services are integrated into a scientific workflow on demand. This article describes an implementation of the semantic provenance framework for glycoproteomics. e Science, also known as cyber­ infrastructure, represents a par­ adigm shift in scientific research that lets scientists harness Web­based computing and data resources to achieve their objectives faster, more efficiently, and on an industrial scale. Using remote software and experi­ mental equipment, scientists can not only access but also generate and pro­ cess data from distributed sources. The resulting data deluge demands computing solutions that can use high­quality metadata — specifically, domain­specific provenance infor­ mation — to automatically interpret, integrate, and process data. Such so­ lutions bring real value to scientists by answering domain­specific queries effectively to support knowledge dis­ covery over large volumes of scientific data. But creating provenance infor­ mation of the requisite quality in the heterogeneous, distributed, and high­ throughput environment of eScience is a daunting challenge. We argue that incorporating domain knowledge and ontological underpin­ ning in provenance using expressive domain­specific provenance ontologies is an approach equal to the challenge. This semantic provenance imposes a JULY/AUGUST 2008 47 Semantic Provenance for eScience formally defined domain­specific conceptual view on scientific data (domain semantics), mitigates or eliminates terminological hetero­ geneity, and enables the use of reasoning tools for knowledge discovery. Furthermore, we de­ fine a " two degrees of separation " approach for creating semantic provenance using special­ ized software tools. Unlike many prevalent workflow­engine­centric approaches, these tools refer to domain­specific provenance on­ tologies to create provenance information and are integrated into a scientific workflow on demand. We combine the essential aspects of high­ quality provenance — characteristics, a repre­ sentation model, the creation process, and usage — into a single semantic provenance framework. This framework will pave the way for software agents to interpret experimental data unam­ biguously for effective …

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Lessons in scientific data interoperability: XML and the eMinerals project.

A collaborative environmental eScience project produces a broad range of data, notable as much for its diversity, in source and format, as its quantity. We find that extensible markup language (XML) and associated technologies are invaluable in managing this deluge of data. We describe Fo X, a toolkit for allowing Fortran codes to read and write XML, thu...

متن کامل

Big Data Exploration

The Big Data Era. We are now entering the era of data deluge, where the amount of data outgrows the capabilities of query processing technology. Many emerging applications, from social networks to scientific experiments, are representative examples of this deluge, where the rate at which data is produced exceeds any past experience. For example, scientific analysis such as astronomy is soon exp...

متن کامل

Designing and Compiling a model of professional competencies of managers in managing the country's sports crises with data Theory Foundation approach

            The real areas of crises in sports testify to the incompetency of managers, therefore, identifying and increasing crisis management capacity is one of the management challenges. The aim of this study was to design and develop a model of managers' competencies in managing national and international sports crises in the country. The research method was exploratory in nature and using ...

متن کامل

Scientific Hypothesis Database

New instruments and techniques used in capturing scientific data are exponentially increasing the volume of data consumed by in-silico research, usually referred to as data deluge. Once captured, scientific data goes through a cleaning workflow before getting ready to analysis that will eventually confirm the scientists hypothesis. The whole process is, nevertheless, complex and takes the focus...

متن کامل

Drowning in the Data Deluge

The Data Deluge and its digital enablers are a gargantuan phenomenon in science. There are many accompanying side effects. These range from the evaluations of scientific research and researchers to how we teach mathematics to children. This article takes a close look at these matters.

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2008